Data sets included in R

# Datasets included in base R
library(help = "datasets")

# All datasets included in the packages you have installed
data(package = .packages(all.available = TRUE))

Identifying and Handling Missing Data

First, load the tidyverse package

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stringr)

Identifying Missing Data in R

Missing data can be problematic in analysis, and it is crucial to identify and handle them appropriately. In R, missing data is typically represented as NA. However, datasets may encode missing values in other ways, such as "N/A", "-", ".", or 999. We can, and should, convert these into proper NAvalues.

Checking for Missing Data:

The is.na() function is useful for identifying missing values in a dataset:

# Example dataset
data <- data.frame(
  id = 1:5,
  value = c(10, "N/A", 15, "-", 20)
)

# Convert character encodings of missing values to NA
data$value[data$value %in% c("N/A", "-")] <- NA

# Convert column to numeric type
data$value <- as.numeric(data$value)

# Check missing data
is.na(data)
##         id value
## [1,] FALSE FALSE
## [2,] FALSE  TRUE
## [3,] FALSE FALSE
## [4,] FALSE  TRUE
## [5,] FALSE FALSE
table(data)
##    value
## id  10 15 20
##   1  1  0  0
##   2  0  0  0
##   3  0  1  0
##   4  0  0  0
##   5  0  0  1

Handling Missing Data

There are multiple strategies to handle missing data:

  • Omitting missing data: Use na.omit() to remove rows with NA values. This is often NOT a good idea. It requires knowing your data, where your missing data values are coming from, why they are missing, and what your analytical goals are.

  • Mean substitution: Replace missing values with the mean of the non-missing values.

  • Data imputation: The process of replacing missing values in a dataset with estimated values based on other available data. This helps maintain dataset completeness and reduces bias introduced by missing observations. The process can range from simple methods like replacing missing values with the mean, median, or mode of a variable to more complex approaches such as regression imputation, k-nearest neighbors (KNN), or multiple imputation, which use predictive modeling techniques to estimate missing values. The choice of method depends on the nature of the data and the assumptions that can be made about the missing values.

Quick Note: In R, indexing with brackets [ ] allows you to access specific elements within vectors, matrices, and data frames. For vectors, a single bracket [ ] is used to extract elements by position, logical condition, or name. For example, x[2] retrieves the second element of a vector x, while x[x > 5] returns elements greater than 5. For matrices and data frames, two brackets [ , ] are used to specify rows and columns. df[1, 2] retrieves the value in the first row and second column of df, while df[ , "column_name"] extracts an entire column by name. Using df[ , 2] returns the second column, and df[1, ] retrieves the entire first row.

Example:

# Mean substitution
data$value[is.na(data$value)] <- mean(data$value, na.rm = TRUE)

# Omit missing data
data_complete <- na.omit(data)

Using stringr for String Manipulation

The stringr package helps in handling and manipulating character data. Let’s use the starwars data set from dplyr.

A long time ago, in a galaxy far, far away...

Example:

glimpse(starwars)
## Rows: 87
## Columns: 14
## $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Or…
## $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2…
## $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.…
## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown", N…
## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light", "…
## $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue", "blue",…
## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0, 57.0, …
## $ sex        <chr> "male", "none", "none", "male", "female", "male", "female",…
## $ gender     <chr> "masculine", "masculine", "masculine", "masculine", "femini…
## $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", "T…
## $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human", "Huma…
## $ films      <list> <"A New Hope", "The Empire Strikes Back", "Return of the J…
## $ vehicles   <list> <"Snowspeeder", "Imperial Speeder Bike">, <>, <>, <>, "Imp…
## $ starships  <list> <"X-wing", "Imperial shuttle">, <>, <>, "TIE Advanced x1",…
# Remove white spaces (there are none)
starwars$name <- trimws(starwars$name)
starwars$name
##  [1] "Luke Skywalker"        "C-3PO"                 "R2-D2"                
##  [4] "Darth Vader"           "Leia Organa"           "Owen Lars"            
##  [7] "Beru Whitesun Lars"    "R5-D4"                 "Biggs Darklighter"    
## [10] "Obi-Wan Kenobi"        "Anakin Skywalker"      "Wilhuff Tarkin"       
## [13] "Chewbacca"             "Han Solo"              "Greedo"               
## [16] "Jabba Desilijic Tiure" "Wedge Antilles"        "Jek Tono Porkins"     
## [19] "Yoda"                  "Palpatine"             "Boba Fett"            
## [22] "IG-88"                 "Bossk"                 "Lando Calrissian"     
## [25] "Lobot"                 "Ackbar"                "Mon Mothma"           
## [28] "Arvel Crynyd"          "Wicket Systri Warrick" "Nien Nunb"            
## [31] "Qui-Gon Jinn"          "Nute Gunray"           "Finis Valorum"        
## [34] "Padmé Amidala"         "Jar Jar Binks"         "Roos Tarpals"         
## [37] "Rugor Nass"            "Ric Olié"              "Watto"                
## [40] "Sebulba"               "Quarsh Panaka"         "Shmi Skywalker"       
## [43] "Darth Maul"            "Bib Fortuna"           "Ayla Secura"          
## [46] "Ratts Tyerel"          "Dud Bolt"              "Gasgano"              
## [49] "Ben Quadinaros"        "Mace Windu"            "Ki-Adi-Mundi"         
## [52] "Kit Fisto"             "Eeth Koth"             "Adi Gallia"           
## [55] "Saesee Tiin"           "Yarael Poof"           "Plo Koon"             
## [58] "Mas Amedda"            "Gregar Typho"          "Cordé"                
## [61] "Cliegg Lars"           "Poggle the Lesser"     "Luminara Unduli"      
## [64] "Barriss Offee"         "Dormé"                 "Dooku"                
## [67] "Bail Prestor Organa"   "Jango Fett"            "Zam Wesell"           
## [70] "Dexter Jettster"       "Lama Su"               "Taun We"              
## [73] "Jocasta Nu"            "R4-P17"                "Wat Tambor"           
## [76] "San Hill"              "Shaak Ti"              "Grievous"             
## [79] "Tarfful"               "Raymus Antilles"       "Sly Moore"            
## [82] "Tion Medon"            "Finn"                  "Rey"                  
## [85] "Poe Dameron"           "BB8"                   "Captain Phasma"
# Remove only the first the blank space
starwars$name <- str_remove(starwars$name, " ")
starwars$name
##  [1] "LukeSkywalker"        "C-3PO"                "R2-D2"               
##  [4] "DarthVader"           "LeiaOrgana"           "OwenLars"            
##  [7] "BeruWhitesun Lars"    "R5-D4"                "BiggsDarklighter"    
## [10] "Obi-WanKenobi"        "AnakinSkywalker"      "WilhuffTarkin"       
## [13] "Chewbacca"            "HanSolo"              "Greedo"              
## [16] "JabbaDesilijic Tiure" "WedgeAntilles"        "JekTono Porkins"     
## [19] "Yoda"                 "Palpatine"            "BobaFett"            
## [22] "IG-88"                "Bossk"                "LandoCalrissian"     
## [25] "Lobot"                "Ackbar"               "MonMothma"           
## [28] "ArvelCrynyd"          "WicketSystri Warrick" "NienNunb"            
## [31] "Qui-GonJinn"          "NuteGunray"           "FinisValorum"        
## [34] "PadméAmidala"         "JarJar Binks"         "RoosTarpals"         
## [37] "RugorNass"            "RicOlié"              "Watto"               
## [40] "Sebulba"              "QuarshPanaka"         "ShmiSkywalker"       
## [43] "DarthMaul"            "BibFortuna"           "AylaSecura"          
## [46] "RattsTyerel"          "DudBolt"              "Gasgano"             
## [49] "BenQuadinaros"        "MaceWindu"            "Ki-Adi-Mundi"        
## [52] "KitFisto"             "EethKoth"             "AdiGallia"           
## [55] "SaeseeTiin"           "YaraelPoof"           "PloKoon"             
## [58] "MasAmedda"            "GregarTypho"          "Cordé"               
## [61] "ClieggLars"           "Pogglethe Lesser"     "LuminaraUnduli"      
## [64] "BarrissOffee"         "Dormé"                "Dooku"               
## [67] "BailPrestor Organa"   "JangoFett"            "ZamWesell"           
## [70] "DexterJettster"       "LamaSu"               "TaunWe"              
## [73] "JocastaNu"            "R4-P17"               "WatTambor"           
## [76] "SanHill"              "ShaakTi"              "Grievous"            
## [79] "Tarfful"              "RaymusAntilles"       "SlyMoore"            
## [82] "TionMedon"            "Finn"                 "Rey"                 
## [85] "PoeDameron"           "BB8"                  "CaptainPhasma"
# Remove ALL the blank spaces
starwars$name <- str_remove_all(starwars$name, " ")
starwars$name
##  [1] "LukeSkywalker"       "C-3PO"               "R2-D2"              
##  [4] "DarthVader"          "LeiaOrgana"          "OwenLars"           
##  [7] "BeruWhitesunLars"    "R5-D4"               "BiggsDarklighter"   
## [10] "Obi-WanKenobi"       "AnakinSkywalker"     "WilhuffTarkin"      
## [13] "Chewbacca"           "HanSolo"             "Greedo"             
## [16] "JabbaDesilijicTiure" "WedgeAntilles"       "JekTonoPorkins"     
## [19] "Yoda"                "Palpatine"           "BobaFett"           
## [22] "IG-88"               "Bossk"               "LandoCalrissian"    
## [25] "Lobot"               "Ackbar"              "MonMothma"          
## [28] "ArvelCrynyd"         "WicketSystriWarrick" "NienNunb"           
## [31] "Qui-GonJinn"         "NuteGunray"          "FinisValorum"       
## [34] "PadméAmidala"        "JarJarBinks"         "RoosTarpals"        
## [37] "RugorNass"           "RicOlié"             "Watto"              
## [40] "Sebulba"             "QuarshPanaka"        "ShmiSkywalker"      
## [43] "DarthMaul"           "BibFortuna"          "AylaSecura"         
## [46] "RattsTyerel"         "DudBolt"             "Gasgano"            
## [49] "BenQuadinaros"       "MaceWindu"           "Ki-Adi-Mundi"       
## [52] "KitFisto"            "EethKoth"            "AdiGallia"          
## [55] "SaeseeTiin"          "YaraelPoof"          "PloKoon"            
## [58] "MasAmedda"           "GregarTypho"         "Cordé"              
## [61] "ClieggLars"          "PoggletheLesser"     "LuminaraUnduli"     
## [64] "BarrissOffee"        "Dormé"               "Dooku"              
## [67] "BailPrestorOrgana"   "JangoFett"           "ZamWesell"          
## [70] "DexterJettster"      "LamaSu"              "TaunWe"             
## [73] "JocastaNu"           "R4-P17"              "WatTambor"          
## [76] "SanHill"             "ShaakTi"             "Grievous"           
## [79] "Tarfful"             "RaymusAntilles"      "SlyMoore"           
## [82] "TionMedon"           "Finn"                "Rey"                
## [85] "PoeDameron"          "BB8"                 "CaptainPhasma"
# Remove ALL the dashes
starwars$name <- str_remove_all(starwars$name, "-")
starwars$name
##  [1] "LukeSkywalker"       "C3PO"                "R2D2"               
##  [4] "DarthVader"          "LeiaOrgana"          "OwenLars"           
##  [7] "BeruWhitesunLars"    "R5D4"                "BiggsDarklighter"   
## [10] "ObiWanKenobi"        "AnakinSkywalker"     "WilhuffTarkin"      
## [13] "Chewbacca"           "HanSolo"             "Greedo"             
## [16] "JabbaDesilijicTiure" "WedgeAntilles"       "JekTonoPorkins"     
## [19] "Yoda"                "Palpatine"           "BobaFett"           
## [22] "IG88"                "Bossk"               "LandoCalrissian"    
## [25] "Lobot"               "Ackbar"              "MonMothma"          
## [28] "ArvelCrynyd"         "WicketSystriWarrick" "NienNunb"           
## [31] "QuiGonJinn"          "NuteGunray"          "FinisValorum"       
## [34] "PadméAmidala"        "JarJarBinks"         "RoosTarpals"        
## [37] "RugorNass"           "RicOlié"             "Watto"              
## [40] "Sebulba"             "QuarshPanaka"        "ShmiSkywalker"      
## [43] "DarthMaul"           "BibFortuna"          "AylaSecura"         
## [46] "RattsTyerel"         "DudBolt"             "Gasgano"            
## [49] "BenQuadinaros"       "MaceWindu"           "KiAdiMundi"         
## [52] "KitFisto"            "EethKoth"            "AdiGallia"          
## [55] "SaeseeTiin"          "YaraelPoof"          "PloKoon"            
## [58] "MasAmedda"           "GregarTypho"         "Cordé"              
## [61] "ClieggLars"          "PoggletheLesser"     "LuminaraUnduli"     
## [64] "BarrissOffee"        "Dormé"               "Dooku"              
## [67] "BailPrestorOrgana"   "JangoFett"           "ZamWesell"          
## [70] "DexterJettster"      "LamaSu"              "TaunWe"             
## [73] "JocastaNu"           "R4P17"               "WatTambor"          
## [76] "SanHill"             "ShaakTi"             "Grievous"           
## [79] "Tarfful"             "RaymusAntilles"      "SlyMoore"           
## [82] "TionMedon"           "Finn"                "Rey"                
## [85] "PoeDameron"          "BB8"                 "CaptainPhasma"
# Make lowercase
starwars$name <- str_to_lower(starwars$name)
starwars$name
##  [1] "lukeskywalker"       "c3po"                "r2d2"               
##  [4] "darthvader"          "leiaorgana"          "owenlars"           
##  [7] "beruwhitesunlars"    "r5d4"                "biggsdarklighter"   
## [10] "obiwankenobi"        "anakinskywalker"     "wilhufftarkin"      
## [13] "chewbacca"           "hansolo"             "greedo"             
## [16] "jabbadesilijictiure" "wedgeantilles"       "jektonoporkins"     
## [19] "yoda"                "palpatine"           "bobafett"           
## [22] "ig88"                "bossk"               "landocalrissian"    
## [25] "lobot"               "ackbar"              "monmothma"          
## [28] "arvelcrynyd"         "wicketsystriwarrick" "niennunb"           
## [31] "quigonjinn"          "nutegunray"          "finisvalorum"       
## [34] "padméamidala"        "jarjarbinks"         "roostarpals"        
## [37] "rugornass"           "ricolié"             "watto"              
## [40] "sebulba"             "quarshpanaka"        "shmiskywalker"      
## [43] "darthmaul"           "bibfortuna"          "aylasecura"         
## [46] "rattstyerel"         "dudbolt"             "gasgano"            
## [49] "benquadinaros"       "macewindu"           "kiadimundi"         
## [52] "kitfisto"            "eethkoth"            "adigallia"          
## [55] "saeseetiin"          "yaraelpoof"          "plokoon"            
## [58] "masamedda"           "gregartypho"         "cordé"              
## [61] "cliegglars"          "pogglethelesser"     "luminaraunduli"     
## [64] "barrissoffee"        "dormé"               "dooku"              
## [67] "bailprestororgana"   "jangofett"           "zamwesell"          
## [70] "dexterjettster"      "lamasu"              "taunwe"             
## [73] "jocastanu"           "r4p17"               "wattambor"          
## [76] "sanhill"             "shaakti"             "grievous"           
## [79] "tarfful"             "raymusantilles"      "slymoore"           
## [82] "tionmedon"           "finn"                "rey"                
## [85] "poedameron"          "bb8"                 "captainphasma"

Recoding Variables

Recoding variables helps in transforming data for better analysis. We can use mutate() and case_when() from the dplyr package.

Example:

# Look at our data
table(starwars$species)
## 
##         Aleena       Besalisk         Cerean       Chagrian       Clawdite 
##              1              1              1              1              1 
##          Droid            Dug           Ewok      Geonosian         Gungan 
##              6              1              1              1              3 
##          Human           Hutt       Iktotchi        Kaleesh       Kaminoan 
##             35              1              1              1              2 
##        Kel Dor       Mirialan   Mon Calamari           Muun       Nautolan 
##              1              2              1              1              1 
##      Neimodian         Pau'an       Quermian         Rodian        Skakoan 
##              1              1              1              1              1 
##      Sullustan     Tholothian        Togruta          Toong      Toydarian 
##              1              1              1              1              1 
##     Trandoshan        Twi'lek     Vulptereen        Wookiee          Xexto 
##              1              2              1              2              1 
## Yoda's species         Zabrak 
##              1              2
# Recode a qualitative variable's value labels:
starwars <- starwars %>%
  mutate(species = case_when(
    species != "Human" ~ "NonHuman",
    species == "Human" ~ "Human"
  ))

# Look at the recoded data
table(starwars$species)
## 
##    Human NonHuman 
##       35       48
# Recode a qualitative variable `mass_group` from a quantitative variable `mass`:
## Histogram of mass of charachters
hist(starwars$mass, breaks = 20, xlab = "Mass", main = "Historgram of mass of StarWars charachters")
abline(v = mean(starwars$mass, na.rm = TRUE), col = "red", lwd = 2)

## Recode mass
starwars <- starwars %>%
  mutate(mass_group = case_when(
    mass < mean(mass, na.rm = TRUE) ~ "BelowMean",
    mass >= mean(mass, na.rm = TRUE) ~ "AboveMean"
  ))

## Look at recoded mass data
table(starwars$mass_group)
## 
## AboveMean BelowMean 
##        10        49
## If we wanted to rename all the variables:
df <- tibble("First Name" = c("Alice", "Bob"), "Last Name" = c("Smith", "Jones"))

# Remove spaces and convert to camelCase
colnames(df) <- str_replace_all(colnames(df), " ", "_")
colnames(df) <- str_to_lower(colnames(df))

print(df)
## # A tibble: 2 × 2
##   first_name last_name
##   <chr>      <chr>    
## 1 Alice      Smith    
## 2 Bob        Jones

Selecting variables

Selecting specific variables from a dataset helps streamline analysis and is useful for keeping your distinct types of data in their own dataframes. The select() function from dplyr allows us to extract specific columns from a dataset.

Example:

starwars_data <- starwars
# Select specific columns
starwars_data_anthropometrics <- starwars_data %>% 
  select(name, height, mass) # Select only these specific columns
print(head(starwars_data_anthropometrics))
## # A tibble: 6 × 3
##   name          height  mass
##   <chr>          <int> <dbl>
## 1 lukeskywalker    172    77
## 2 c3po             167    75
## 3 r2d2              96    32
## 4 darthvader       202   136
## 5 leiaorgana       150    49
## 6 owenlars         178   120
# Select columns based on a pattern
starwars_data_demographics <- starwars_data %>% 
  select(name, contains("_")) # Selects name and any column that contains an underscore
print(head(starwars_data_demographics))
## # A tibble: 6 × 6
##   name          hair_color  skin_color  eye_color birth_year mass_group
##   <chr>         <chr>       <chr>       <chr>          <dbl> <chr>     
## 1 lukeskywalker blond       fair        blue            19   BelowMean 
## 2 c3po          <NA>        gold        yellow         112   BelowMean 
## 3 r2d2          <NA>        white, blue red             33   BelowMean 
## 4 darthvader    none        white       yellow          41.9 AboveMean 
## 5 leiaorgana    brown       light       brown           19   BelowMean 
## 6 owenlars      brown, grey light       blue            52   AboveMean

Joining and Merging Data

Merging datasets is a key skill when dealing with real-world data. We will focus on left_join() and inner_join() from dplyr.

Example of types of joins

Left Join Example

A left_join() keeps all rows from the left dataset and adds matching data from the right dataset. If no match is found, NA values are introduced.

# Example datasets
data1 <- data.frame(id = 1:5, name = c("Alice", "Bob", "Charlie", "David", "Eva"), 
                    score = c(85, 90, 88, 92, 95))
data2 <- data.frame(id = c(3, 4, 5, 6), 
                    grade = c("B+", "A", "A-", "B"))

head(data1)
##   id    name score
## 1  1   Alice    85
## 2  2     Bob    90
## 3  3 Charlie    88
## 4  4   David    92
## 5  5     Eva    95
head(data2)
##   id grade
## 1  3    B+
## 2  4     A
## 3  5    A-
## 4  6     B
# Left Join
data_merged <- left_join(data1, data2, by = "id")
print(data_merged)
##   id    name score grade
## 1  1   Alice    85  <NA>
## 2  2     Bob    90  <NA>
## 3  3 Charlie    88    B+
## 4  4   David    92     A
## 5  5     Eva    95    A-

Inner Join Example

An inner_join() keeps only the rows that have matching keys in both datasets.

# Inner Join
data_inner <- inner_join(data1, data2, by = "id")
print(data_inner)
##   id    name score grade
## 1  3 Charlie    88    B+
## 2  4   David    92     A
## 3  5     Eva    95    A-

Hands-On Exercise: Working with World Bank Data

Step 1: Download Data

  • Go to the World Bank website and find a dataset of interest, for example, World Development Indicators.

WorldBank data

  • Select all countries.

  • Select at least two specific variables (under Series), e.g., GPD (Current US$), People using safely mangaged drinking water services (% of population)

Click the i for operational definition: GDP at purchaser’s prices is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in current U.S. dollars. Dollar figures for GDP are converted from domestic currencies using single year official exchange rates. For a few countries where the official exchange rate does not reflect the rate effectively applied to actual foreign exchange transactions, an alternative conversion factor is used.

  • Select one year (e.g., 2023) (or two if you want a challenge!).

  • You might need to click on Apply changes.

Selecting WorldBank data

  • Download the dataset in CSV format.

Example of downloaded WorldBank data

  • Inspect the CSV file

Question: Do you need to do any editing to the CSV file before it is ready to read in to R?

Step 2a: Import data into R

  • Move your csv file you downloaded into either A) your projects folder you are using, B) the working directory you use in R, or C) be prepared to read that file into R based on where it is located on your machine (i.e., the file location’s path). Protip: Option A is the easiest, but it requires you to be using Projects in R Studio.

Step 2b: Import additional country meta data into R

url <- "https://raw.githubusercontent.com/zhgarfield/raw2refined/main/data/country_metadata.csv"
country_metadata <- read.csv(url, stringsAsFactors = FALSE)

# Check the first few rows
head(country_metadata)
##             name alpha.2 alpha.3 country.code    iso_3166.2  region
## 1    Afghanistan      AF     AFG            4 ISO 3166-2:AF    Asia
## 2  Ã…land Islands      AX     ALA          248 ISO 3166-2:AX  Europe
## 3        Albania      AL     ALB            8 ISO 3166-2:AL  Europe
## 4        Algeria      DZ     DZA           12 ISO 3166-2:DZ  Africa
## 5 American Samoa      AS     ASM           16 ISO 3166-2:AS Oceania
## 6        Andorra      AD     AND           20 ISO 3166-2:AD  Europe
##        sub.region intermediate.region region.code sub.region.code
## 1   Southern Asia                             142              34
## 2 Northern Europe                             150             154
## 3 Southern Europe                             150              39
## 4 Northern Africa                               2              15
## 5       Polynesia                               9              61
## 6 Southern Europe                             150              39
##   intermediate.region.code
## 1                       NA
## 2                       NA
## 3                       NA
## 4                       NA
## 5                       NA
## 6                       NA
  • Merge these two datasets together…but, how and when???

Step 3: Tidy your data

How do these data violate tidy data principles?

What do we need to do to make them tidy? * Do we need to reshape? * Is there missing data to omit? * Do you want to adjust variable names? * Any name changes to make match? * Join?

Tidy your data!

From (openscapes.org)[https://openscapes.org/blog/2020-10-12-tidy-data/]